Report of NEWS 2010 Transliteration Mining Shared Task
نویسندگان
چکیده
This report documents the details of the Transliteration Mining Shared Task that was run as a part of the Named Entities Workshop (NEWS 2010), an ACL 2010 workshop. The shared task featured mining of name transliterations from the paired Wikipedia titles in 5 different language pairs, specifically, between English and one of Arabic, Chinese, Hindi Russian and Tamil. Totally 5 groups took part in this shared task, participating in multiple mining tasks in different languages pairs. The methodology and the data sets used in this shared task are published in the Shared Task White Paper [Kumaran et al, 2010]. We measure and report 3 metrics on the submitted results to calibrate the performance of individual systems on a commonly available Wikipedia dataset. We believe that the significant contribution of this shared task is in (i) assembling a diverse set of participants working in the area of transliteration mining, (ii) creating a baseline performance of transliteration mining systems in a set of diverse languages using commonly available Wikipedia data, and (iii) providing a basis for meaningful comparison and analysis of trade-offs between various algorithmic approaches used in mining. We believe that this shared task would complement the NEWS 2010 transliteration generation shared task, in enabling development of practical systems with a small amount of seed data in a given pair of languages.
منابع مشابه
Whitepaper of NEWS 2010 Shared Task on Transliteration Mining
Transliteration is generally defined as phonetic translation of names across languages. Machine Transliteration is a critical technology in many domains, such as machine translation, cross-language information retrieval/extraction, etc. Recent research has shown that high quality machine transliteration systems may be developed in a language-neutral manner, using a reasonably sized good quality...
متن کاملReport of NEWS 2010 Transliteration Generation Shared Task
This report documents the Transliteration Generation Shared Task conducted as a part of the Named Entities Workshop (NEWS 2010), an ACL 2010 workshop. The shared task features machine transliteration of proper names from English to 9 languages and from 3 languages to English. In total, 12 tasks are provided. 7 teams from 5 different countries participated in the evaluations. Finally, 33 standar...
متن کاملTransliteration Mining with Phonetic Conflation and Iterative Training
This paper presents transliteration mining on the ACL 2010 NEWS workshop shared transliteration mining task data. Transliteration mining was done using a generative transliteration model applied on the source language and whose output was constrained on the words in the target language. A total of 30 runs were performed on 5 language pairs, with 6 runs for each language pair. In the presence of...
متن کاملA Statistical Model for Unsupervised and Semi-supervised Transliteration Mining
We propose a novel model to automatically extract transliteration pairs from parallel corpora. Our model is efficient, language pair independent and mines transliteration pairs in a consistent fashion in both unsupervised and semi-supervised settings. We model transliteration mining as an interpolation of transliteration and non-transliteration sub-models. We evaluate on NEWS 2010 shared task d...
متن کاملStatistical models for unsupervised, semi-supervised and supervised transliteration mining
We present a generative model that efficiently mines transliteration pairs in a consistent fashion in three different settings, unsupervised, semi-supervised and supervised transliteration mining. The model interpolates two sub-models, one for the generation of transliteration pairs and one for the generation of non-transliteration pairs (i.e. noise). The model is trained on noisy unlabelled da...
متن کامل